-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataflow: Simplify references to access paths from prior stage. #18258
Dataflow: Simplify references to access paths from prior stage. #18258
Conversation
Dca is looking good - fairly uneventful. There doesn't appear to be any regressions, and there's possibly a few small performance improvements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simplification 🎉 . Two questions.
exists(boolean emptyAp | | ||
flowIntoCallApaTaken(call, c, arg, p, emptyAp) and | ||
fwdFlow(arg, _, _, _, _, ap, _, _) and | ||
if ap instanceof ApNil then emptyAp = true else emptyAp = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is repeated in many places, so perhaps move out into a separate predicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I also took the opportunity to rename a couple of predicates that were now somewhat misleading as their name included reference to the removed ApApprox
column.
@@ -171,10 +171,12 @@ edges | |||
| B.java:154:17:154:28 | source(...) : String | B.java:175:5:175:6 | String s2 : String | provenance | | | |||
| B.java:158:7:158:13 | parameter this : MyLocal [String s1] : String | B.java:159:18:159:19 | this : MyLocal [String s1] : String | provenance | | | |||
| B.java:158:7:158:13 | parameter this : MyLocal [String s2] : String | B.java:160:14:160:15 | this : MyLocal [String s2] : String | provenance | | | |||
| B.java:158:7:158:13 | parameter this : MyLocal [String s2] : String | B.java:160:14:160:15 | this : MyLocal [String s2] : String | provenance | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these test changes expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. When a callable has both flow-through and flow to a sink without flow-through and the difference is dependent on the access path (and not just if it is empty or not), then we generate slightly more nodes, since we can no longer determine with the same level of precision whether we need to remember a summary context or not. To pinpoint it in the code, this is due to the change to PrevStage::parameterMayFlowThrough
using an emptyAp
boolean instead of ApApprox
in its use in callEdgeArgParamRestricted
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know that this is expected.
We carry
ApApprox
columns through a lot of predicates for a number of joins, which then might filter a little bit more, but ultimately most of this filtering will be achieved by the main filtering infwdFlow1
where we join withPrevStage::revFlow
.MRVA testing shows that the number of additional tuples calculated in
fwdFlow
+revFlow
is completely negligible by removing these early joins. The worst case I found yielded an additional 0.1% tuples, but all other cases had much smaller increases. So it's quite plausible that the additional data in all of these columns cost more performance than they provide. And if so, this will be a nice simplification.Commit-by-commit review is recommended.